Skip to content

update eval-driven-dev skill#1434

Open
yiouli wants to merge 3 commits intogithub:stagedfrom
yiouli:staged
Open

update eval-driven-dev skill#1434
yiouli wants to merge 3 commits intogithub:stagedfrom
yiouli:staged

Conversation

@yiouli
Copy link
Copy Markdown
Contributor

@yiouli yiouli commented Apr 17, 2026

Pull Request Checklist

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have read and followed the Guidance for submissions involving paid services.
  • My contribution adds a new instruction, prompt, agent, skill, or workflow file in the correct directory.
  • The file follows the required naming convention.
  • The content is clearly structured and follows the example format.
  • I have tested my instructions, prompt, agent, skill, or workflow with GitHub Copilot.
  • I have run npm start and verified that README.md is up to date.
  • I am targeting the staged branch for this pull request.

Description

Update eval-driven-dev skill: Adding comprehensive analysis step after evaluation runs.


Type of Contribution

  • New instruction file.
  • New prompt file.
  • New agent file.
  • New plugin.
  • New skill file.
  • New agentic workflow.
  • Update to existing instruction, prompt, agent, plugin, skill, or workflow.
  • Other (please specify):

Additional Notes


By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.

Copilot AI review requested due to automatic review settings April 17, 2026 22:29
@yiouli yiouli requested a review from aaronpowell as a code owner April 17, 2026 22:29
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

🔍 Skill Validator Results

⚠️ Warnings or advisories found

Scope Checked
Skills 1
Agents 1
Total 2
Severity Count
--- ---:
❌ Errors 0
⚠️ Warnings 2
ℹ️ Advisories 0

Summary

Level Finding
ℹ️ Found 1 skill(s)
ℹ️ [eval-driven-dev] 📊 eval-driven-dev: 3,768 BPE tokens [chars/4: 4,311] (standard ~), 16 sections, 1 code blocks
ℹ️ [eval-driven-dev] ⚠ Skill is 3,768 BPE tokens (chars/4 estimate: 4,311) — approaching "comprehensive" range where gains diminish.
ℹ️ [eval-driven-dev] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably.
ℹ️ ✅ All checks passed (1 skill(s))
Full validator output ```text Found 1 skill(s) [eval-driven-dev] 📊 eval-driven-dev: 3,768 BPE tokens [chars/4: 4,311] (standard ~), 16 sections, 1 code blocks [eval-driven-dev] ⚠ Skill is 3,768 BPE tokens (chars/4 estimate: 4,311) — approaching "comprehensive" range where gains diminish. [eval-driven-dev] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. ✅ All checks passed (1 skill(s)) ```

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the eval-driven-dev skill to align with newer pixie-qa concepts (notably input_data, agent evaluators, and structured post-run analysis) and expands the skill’s references to include a dedicated Step 6 analysis workflow and runnable implementation examples.

Changes:

  • Updates the skill metadata and setup workflow to target pixie-qa >=0.8.1,<0.9.0 and revises setup/error-handling guidance.
  • Refactors the skill’s step-by-step reference docs (new Step 1a project analysis, split Step 2, new Step 6 “Analyze Outcomes”, removal of older combined/iteration docs).
  • Adds runnable examples (standalone function, FastAPI, CLI) and updates API reference docs to reflect the newer dataset shapes (input_data etc.).

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
skills/eval-driven-dev/resources/setup.sh Updates install/upgrade logic and adds stricter failure handling for pixie init/start.
skills/eval-driven-dev/references/wrap-api.md Updates wrap API reference and CLI wording (including dataset field naming).
skills/eval-driven-dev/references/testing-api.md Updates testing API reference to match new dataset schema and runner behavior.
skills/eval-driven-dev/references/evaluators.md Adds create_agent_evaluator reference and updates evaluator selection guidance.
skills/eval-driven-dev/references/6-investigate.md Removes prior Step 6 “investigate/iterate” reference.
skills/eval-driven-dev/references/6-analyze-outcomes.md Adds new structured, multi-phase Step 6 analysis workflow and required outputs.
skills/eval-driven-dev/references/5-run-tests.md Reframes Step 5 as “run tests and fix mechanical issues” and updates commands/content.
skills/eval-driven-dev/references/4-build-dataset.md Updates dataset schema (input_data), adds realism audits, and expands capture guidance.
skills/eval-driven-dev/references/3-define-evaluators.md Shifts evaluator strategy toward agent evaluators and updates mapping guidance.
skills/eval-driven-dev/references/2c-capture-and-verify-trace.md Adds a dedicated sub-step doc for trace capture and verification.
skills/eval-driven-dev/references/2b-implement-runnable.md Adds a dedicated sub-step doc for Runnable implementation and placement.
skills/eval-driven-dev/references/2a-instrumentation.md Adds a dedicated sub-step doc for wrap() instrumentation practices.
skills/eval-driven-dev/references/2-wrap-and-trace.md Removes older combined Step 2 reference.
skills/eval-driven-dev/references/1-c-eval-criteria.md Adds updated eval criteria guidance tied to project analysis and failure modes.
skills/eval-driven-dev/references/1-b-eval-criteria.md Removes older Step 1b eval criteria reference.
skills/eval-driven-dev/references/1-b-entry-point.md Renumbers/updates entry-point documentation and emphasizes capability prioritization.
skills/eval-driven-dev/references/1-a-project-analysis.md Adds new Step 1a project analysis reference and required outputs.
skills/eval-driven-dev/references/runnable-examples/standalone-function.md Adds runnable example for direct function invocation.
skills/eval-driven-dev/references/runnable-examples/fastapi-web-server.md Adds runnable example for FastAPI/ASGI in-process evaluation.
skills/eval-driven-dev/references/runnable-examples/cli-app.md Adds runnable example for CLI subprocess execution.
skills/eval-driven-dev/SKILL.md Updates skill description/versioning and rewrites the step flow to include analysis.
docs/README.skills.md Updates the skills index entry for eval-driven-dev to match the new references list.

Comment thread skills/eval-driven-dev/resources/setup.sh Outdated
Comment thread skills/eval-driven-dev/SKILL.md
Comment thread skills/eval-driven-dev/references/wrap-api.md
Comment thread skills/eval-driven-dev/references/testing-api.md
Comment thread skills/eval-driven-dev/references/3-define-evaluators.md
Comment thread skills/eval-driven-dev/references/4-build-dataset.md
Comment thread skills/eval-driven-dev/resources/setup.sh
Comment thread skills/eval-driven-dev/resources/setup.sh
Comment thread skills/eval-driven-dev/SKILL.md
Comment thread skills/eval-driven-dev/references/evaluators.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants